推荐系统(RSS)旨在模拟和预测用户偏好,同时与诸如兴趣点(POI)的项目进行交互。这些系统面临着几种挑战,例如数据稀疏性,限制了它们的有效性。在本文中,我们通过将社会,地理和时间信息纳入矩阵分解(MF)技术来解决这个问题。为此,我们基于两个因素模拟社会影响:用户之间的相似之处在常见的办理登机手续和它们之间的友谊方面。我们根据明确的友谊网络和用户之间的高支票重叠介绍了两个友谊。我们基于用户的地理活动中心友好算法。结果表明,我们所提出的模型在两个真实的数据集中优于最先进的。更具体地说,我们的消融研究表明,社会模式在精确的@ 10分别在Gowalla和Yelp数据集中提高了我们所提出的POI推荐系统的表现。
translated by 谷歌翻译
人类行动识别(HAR)旨在理解人类行为并为每个行动分配标签。它具有广泛的应用程序,因此在计算机视觉领域引起了越来越多的关注。可以使用各种数据模式来代表人类的行动,例如RGB,骨骼,深度,红外,点云,事件流,音频,加速,雷达和WiFi信号,它们编码有用但不同信息的不同来源,并具有各种优势,取决于不同在应用程序方案。因此,许多现有作品都试图使用各种方式研究HAR的不同类型的方法。在本文中,我们根据输入数据模式的类型进行了对HAR深度学习方法的最新进展的全面调查。具体而言,我们回顾了当前的单个数据模式和多种数据模式的主流深度学习方法,包括基于融合的基于融合和基于共学习的框架。我们还在HAR的几个基准数据集上提出了比较结果,以及有见地的观察结果并激发了未来的研究方向。
translated by 谷歌翻译
Based on WHO statistics, many individuals are suffering from visual problems, and their number is increasing yearly. One of the most critical needs they have is the ability to navigate safely, which is why researchers are trying to create and improve various navigation systems. This paper provides a navigation concept based on the visual slam and Yolo concepts using monocular cameras. Using the ORB-SLAM algorithm, our concept creates a map from a predefined route that a blind person most uses. Since visually impaired people are curious about their environment and, of course, to guide them properly, obstacle detection has been added to the system. As mentioned earlier, safe navigation is vital for visually impaired people, so our concept has a path-following part. This part consists of three steps: obstacle distance estimation, path deviation detection, and next-step prediction, done by monocular cameras.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose that facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP.
translated by 谷歌翻译
Utilizing autonomous drones or unmanned aerial vehicles (UAVs) has shown great advantages over preceding methods in support of urgent scenarios such as search and rescue (SAR) and wildfire detection. In these operations, search efficiency in terms of the amount of time spent to find the target is crucial since with the passing of time the survivability of the missing person decreases or wildfire management becomes more difficult with disastrous consequences. In this work, it is considered a scenario where a drone is intended to search and detect a missing person (e.g., a hiker or a mountaineer) or a potential fire spot in a given area. In order to obtain the shortest path to the target, a general framework is provided to model the problem of target detection when the target's location is probabilistically known. To this end, two algorithms are proposed: Path planning and target detection. The path planning algorithm is based on Bayesian inference and the target detection is accomplished by means of a residual neural network (ResNet) trained on the image dataset captured by the drone as well as existing pictures and datasets on the web. Through simulation and experiment, the proposed path planning algorithm is compared with two benchmark algorithms. It is shown that the proposed algorithm significantly decreases the average time of the mission.
translated by 谷歌翻译
Transformers have become central to recent advances in computer vision. However, training a vision Transformer (ViT) model from scratch can be resource intensive and time consuming. In this paper, we aim to explore approaches to reduce the training costs of ViT models. We introduce some algorithmic improvements to enable training a ViT model from scratch with limited hardware (1 GPU) and time (24 hours) resources. First, we propose an efficient approach to add locality to the ViT architecture. Second, we develop a new image size curriculum learning strategy, which allows to reduce the number of patches extracted from each image at the beginning of the training. Finally, we propose a new variant of the popular ImageNet1k benchmark by adding hardware and time constraints. We evaluate our contributions on this benchmark, and show they can significantly improve performances given the proposed training budget. We will share the code in https://github.com/BorealisAI/efficient-vit-training.
translated by 谷歌翻译
病理学家对患病组织的视觉微观研究一直是一个多世纪以来癌症诊断和预后的基石。最近,深度学习方法在组织图像的分析和分类方面取得了重大进步。但是,关于此类模型在生成组织病理学图像的实用性方面的工作有限。这些合成图像在病理学中有多种应用,包括教育,熟练程度测试,隐私和数据共享的公用事业。最近,引入了扩散概率模型以生成高质量的图像。在这里,我们首次研究了此类模型的潜在用途以及优先的形态加权和颜色归一化,以合成脑癌的高质量组织病理学图像。我们的详细结果表明,与生成对抗网络相比,扩散概率模型能够合成各种组织病理学图像,并且具有较高的性能。
translated by 谷歌翻译
细粒度识别的目的是成功区分具有微妙差异的动作类别。为了解决这个问题,我们从人类视觉系统中获得灵感,该系统包含大脑中专门用于处理特定任务的专业区域。我们设计了一个新型的动态时空专业化(DSTS)模块,该模块由专门的神经元组成,这些神经元仅针对高度相似的样品子集激活。在训练过程中,损失迫使专门的神经元学习判别性细粒差异,以区分这些相似的样品,从而改善细粒度的识别。此外,一种时空专业化方法进一步优化了专业神经元的架构,以捕获更多的空间或时间细粒信息,以更好地解决视频中各种时空变化的范围。最后,我们设计了上游下游学习算法,以优化训练过程中模型的动态决策,从而提高DSTS模块的性能。我们在两个广泛使用的细粒度识别数据集上获得了最先进的性能。
translated by 谷歌翻译
众所周知,庞大的文本数据始终是培训深层模型(例如基于变压器)的关键需求。这个问题正在以较低的资源语言(例如Farsi)出现。我们提出了Naab,这是Farsi中最大的清洁和现成的开源文本语料库。它包含约130GB的数据,2.5亿段和150亿个单词。项目名称源自Farsi Word Naab K,这意味着纯净和高级。我们还提供了名为Naab-Raw的语料库的原始版本,以及易于使用的预处理器,可以由想要制作定制语料库的人使用。
translated by 谷歌翻译